Objects in R

2023 Bio R Workshop

Author

Prof. Rey R. Cuenca
Math-Stat Dept., MSU-IIT

An Intuitive Framework

One approach to get a partial yet quick understanding of a complex system of ideas is to have a simplified mental picture of it. This same approach is applied when we want to learn R as its learning is quite steep:

The learning curve for R programming is steep due to its unique syntax and extensive set of commands, requiring most new learners to spend four to six weeks mastering it.” - Noble Desktop, (NYC’s Top Design & Coding School Since 1990)

A (over)simplified mental picture for beginners of R is to analogize working in R as cooking. Cooking essentially requires three things:

  1. Ingredients – R objects a.k.a “data containers”
  2. Cooking utensils/equipments – R functions
  3. Recipe – R scripts or Markdown files

Figure 1: Mental picture when working with R

You can think of RStudio’s Console and Source Panes as the “chef’s” (you) cooking table.

Vectors

Probably the most fundamental object that act as “data container” (i.e. data structure) in R is called a vector (also called atomic vectors). Almost all other objects in R that are used by the common user is built up in terms of vectors. Any vector contains three properties:

  1. Type - typeof(), what it is
  2. Length - length(), how many elements it contains
  3. Attributes - attributes(), additional arbitrary metadata

Creating vectors could be done in many ways. However, two of most basic ways depends on the length of the vector:

  1. Length = 1. Directly run a single alphanumeric characters in the Console Pane.
  2. Length > 1. Use the R combine command c().

Characters or Strings

"a"
"a,    b,c"
c("a","b","c")
typeof("a")
typeof("a,    b,c")
typeof(c("a","b","c"))
length("a")
length("a,    b,c")
length(c("a","b","c"))

Numbers

15L
1.0
1 + 2i

c(1L,2L,0L,-15L)
c(1.0,1,4,6,-56,1e-10,1e4)
c(1 + 2i,1,0 - 3i, 3i)
typeof(15L)
typeof(1.0)
typeof(1 + 2i)

typeof(c(1L,2L,0L,-15L))
typeof(c(1.0,1,4,6,-56,1e-10,1e4))
typeof(c(1 + 2i,1,0 - 3i, 3i))
length(15L)
length(1.0)
length(1 + 2i)

length(c(1L,2L,0L,-15L))
length(c(1.0,1,4,6,-56,1e-10,1e4))
length(c(1 + 2i,1,0 - 3i, 3i))

Logical or Boolean

T
F
TRUE
FALSE
c(T,FALSE)
c(T,T,T,T,F,FALSE,F,TRUE,T,FALSE,T)
typeof(c(T,FALSE))
length(c(T,FALSE))
attributes(c(T,FALSE))

Matrix

# Number of entries matches number of elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 4, byrow = TRUE)

# Number of entries does not matche number of elements
# Resolved by recycling elements
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 10)
matrix(c(1,2,3,4,5,6,7,8), nrow = 2, ncol = 13, byrow = TRUE)
## Example of setting row and column names
matrix(data = c(1,2,3, 11,12,13), 
       nrow = 2, 
       ncol = 3, 
       byrow = TRUE,
       dimnames = list(c("row1", "row2"),
                       c("C.1", "C.2", "C.3")))
cbind(c(1,2,3,4), c(5,6,7,8))
rbind(c(1,2,3,4), c(5,6,7,8))
cbind(c(1,2,3,4),
      c(5,6,7,8), 
      c("A","B","C","D"))

rbind(c(1,2,3,4),
      c(5,6,7,8),
      c(T,F,T,T))

rbind(c(143,243),
      cbind(c(5,6,7,8), 
            c(T,F,T,T)))

Data Frame

data.frame(
  ID = c(1103,1483,5670),
  Name = c("Mark","John","Maria"),
  Age = c(15L,13L,16L),
  BType = c("A","O","B"),
  WVaccine = c(T,T,F)
)
    ID  Name Age BType WVaccine
1 1103  Mark  15     A     TRUE
2 1483  John  13     O     TRUE
3 5670 Maria  16     B    FALSE
dplyr::tibble(
  ID = c(1103,1483,5670),
  Name = c("Mark","John","Maria"),
  Age = c(15L,13L,16L),
  BType = c("A","O","B"),
  WVaccine = c(T,T,F)
)
# A tibble: 3 × 5
     ID Name    Age BType WVaccine
  <dbl> <chr> <int> <chr> <lgl>   
1  1103 Mark     15 A     TRUE    
2  1483 John     13 O     TRUE    
3  5670 Maria    16 B     FALSE   

Lists

A list a vector in “steroids”. While vectors only allows a single type (logical, numeric, etc) of data, lists allows a mixture of different types of data. In other words, a vector is homogeneous type of container while lists is the heterogeneous type.

c(1,2,3)
list(1,2,3)
c(1,"A",TRUE,c(5.4,-4.0))
list(1,"A",TRUE,c(5.4,-4.0))
typeof(list(1,"A",TRUE,c(5.4,-4.0)))
length(list(1,"A",TRUE,c(5.4,-4.0)))
attributes(list(1,"A",TRUE,c(5.4,-4.0)))
list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0))
typeof(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
length(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
attributes(list(Name1 = 1, Name2 = "A", Name3 = TRUE, Name4 = c(5.4,-4.0)))
list(Name1 = 1,
     Name2 = "A",
     Name3 = TRUE,
     Name4 = c(5.4,-4.0))
list("Name 1" = 1,
     "Name 2" = "A",
     "Name 3" = TRUE,
     "Name 4" = c(5.4,-4.0))
list(`Name 1` = 1,
     `Name 2` = "A",
     `Name 3` = TRUE,
     `Name 4` = c(5.4,-4.0))
list(
  `A vector` = 1:10,
  `A matrix` = matrix(1:9, nrow = 3),
  `A list` = list(Name1 = 1, 
                        Name2 = "A", 
                        Name3 = TRUE, 
                        Name4 = c(5.4,-4.0))
)

Variables and Constants

In computer programming, a variable is a named memory location where data is stored. Constants are those entities whose values aren’t meant to be changed anywhere throughout the code

x <- c(5,19,-2,0)
x
typeof(x)
length(x)
HONEY <- list(1,"A",TRUE,c(5.4,-4.0))
HONEY
typeof(HONEY)
length(HONEY)
student_data <- data.frame(
                  ID = c(1103,1483,5670),
                  Name = c("Mark","John","Maria"),
                  Age = c(15L,13L,16L),
                  BType = c("A","O","B"),
                  WVaccine = c(T,T,F)
                )
student_data
typeof(student_data)
length(student_data)
attributes(student_data)

The variables x, HONEY, and student_data are stored in the Global Environment through the Environment Pane:

You also list down all the existing variables you have stored in the Global Environment using the ls() command:

ls()

There are certain rules that need to be followed while creating a variable and constants:

  • A variable name in R can be created using letters, digits, periods, and underscores.

  • You can start a variable name with a letter or a period, but not with digits.

  • If a variable name starts with a dot, you can’t follow it with digits.

  • For multi-word variable names, it is advised to underscores in place of spaces. For example, first_name, student_id, etc.

  • R is case sensitive. This means that age and Age are treated as different variables.

  • We have some reserved words that cannot be used as variable names. These are names that are built-in R and changing them leads to “horrifying” consequences. You are warned!

Special and Built-in R Constants

Special R Constants:

  • NULL – to declare an empty R object.

    x <- NULL
    x
    x <- c(5,NULL,-6)
    x
  • Inf / -Inf – represents positive and negative infinity or numbers that exceeds the capacity of the machine.

    Inf
    -Inf
  • NaN (Not a Number) – represents undefined numerical value like 0/0 or Inf/Inf .

    NaN
    0/0
    Inf/Inf
  • NA (Not Available) – represents values which is not available.

Built-in R Constants:

  • LETTERS – the 26 upper-case letters of the Roman alphabet

    LETTERS
  • letters – the 26 lower-case letters of the Roman alphabet

    letters
  • month.abb – the three-letter abbreviations for the English month names

    month.abb
  • month.name – the English names for the months of the year

    month.name
  • pi – the constant \(\pi=3.1415927\ldots\), i.e. the ratio of the circumference of a circle to its diameter

    pi

NOTE: Constants is not limited to vectors but other R objects as well like matrices, lists, and data frames.

Functions

Functions are objects in R that takes in inputs and returns outputs by performing tasks it is defined. A typical template for creating functions is as follows:

name_of_function <- function(var1,var2,var3,...) {
  
  # Lines of codes to perform tasks with or without using var1, var2, var3, ...
  
  return(some_value_result)
}

Consider the function

\[ G(x,y) = \frac{x + y}{2} \]

This function takes two inputs \(x\) and \(y\) called variables of \(G\)(also called arguments of \(G\)) and returns the output \(G(x,y)\). The task is to take is to add this two arguments and divide the result by \(2\). In R, we can write this as follows.

G <- function(x,y) {
  
  value <- (x + y) / 2
  
  return(value)
}

A more lazy version is

G <- function(x,y) {
  (x + y) / 2
}

Let’s try testing our function.

G(x = 2, y = 6) # (2 + 6) / 2
G(x = 2, y = -6) # (2 + (-6)) / 2
G(x = 2, y = 1/2) # (2 + (1/2)) / 2
G(x = 2, y = pi) # (2 + pi) / 2

A Note on Advantages of using Functions

  • One of the advantages of R being a programming language is that by creating functions, repeatitive tasks that requires running multiple lines of codes can be simplified to a single line.

  • With the large of community of R, it is very likely that someone has already defined a function that can solved the tasks you want. Such functions are bundled together in what we call packages. One can get access to the list of functions from an installed package using the double colon notation :: with usage package_name::function_name and take advantage of RStudio’s auto-completion behavior. For example, we can access the list functions inside the package dplyr (installed along with tidyverse) by simply typing dplyr::

    dplyr::

    Figure 2: A typical output of RStudio’s auto-completion when using the double-colon on package names.

    One also access the same list by using the Help page of the package which can be accessed using the following steps:

    1. In code line (either on Console or Source Pane), position the (blinking) cursor over the name of the package and press F1.

    2. The Help Pane for the package will open. At the bottom, locate and click the linked text name “Index”. This will open the list of all functions and even constants (e.g. data sets in the form data frames and matrices).

      Figure 3: Finding the help page for the package dplyr.